[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD #12282

rasmith · 2025-01-21T21:26:08Z

We aren't able to run int8 models anymore, it's completely broken. This because there is no TritonScaledMMLinearKernel class.

I added in TritonScaledMMLinearKernel. Since there is no Triton kernel to handle asymmetric int8 quantization, the new TritonScaledMMLinearKernel checks for this case and returns a failure.

I tested and everything seems to work.

Signed-off-by: Randall Smith <[email protected]>

github-actions · 2025-01-21T21:26:20Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

mgoin

Sorry for the disruption and thanks for the fix. Could you add an int8 test to the AMD runner to make sure this doesn't regress in the future?

rasmith · 2025-01-21T22:45:40Z

Sorry for the disruption and thanks for the fix. Could you add an int8 test to the AMD runner to make sure this doesn't regress in the future?

@mgoin Where is the AMD runner?

mgoin · 2025-01-21T22:57:17Z

I'm afk right now but you can look at the tests that have mirror hardware amd, each of these will run on the amd runner

https://github.com/vllm-project/vllm/blob/main/.buildkite/test-pipeline.yaml#L93

Signed-off-by: Randall Smith <[email protected]>

rasmith · 2025-01-22T20:46:23Z

Sorry for the disruption and thanks for the fix. Could you add an int8 test to the AMD runner to make sure this doesn't regress in the future?

@mgoin I put a test in test_triton_scaled_mm.py to just load a small model and test it. It only runs when current platform is ROCm.

mgoin · 2025-01-22T20:50:01Z

Thank you, looks good!

TritonScaledMMLinearKernel implementation

9e8bad6

Signed-off-by: Randall Smith <[email protected]>

rasmith requested review from mgoin, robertgshaw2-redhat and tlrmchlsmth as code owners January 21, 2025 21:26

mgoin approved these changes Jan 21, 2025

View reviewed changes

robertgshaw2-redhat approved these changes Jan 21, 2025

View reviewed changes

Add regression test for rocm w8a8

daf9a71

Signed-off-by: Randall Smith <[email protected]>

rasmith requested a review from WoosukKwon as a code owner January 22, 2025 18:32

rasmith added 2 commits January 22, 2025 18:50

remote unused import

9c11d5c

Signed-off-by: Randall Smith <[email protected]>

ruff

4e4d633

Signed-off-by: Randall Smith <[email protected]>

mgoin approved these changes Jan 22, 2025

View reviewed changes

mgoin added quantization ready ONLY add when PR is ready to merge/full CI is needed labels Jan 22, 2025

mgoin enabled auto-merge (squash) January 22, 2025 20:50

mgoin merged commit 68c4421 into vllm-project:main Jan 23, 2025
66 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD #12282

[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD #12282

rasmith commented Jan 21, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 21, 2025

mgoin left a comment

rasmith commented Jan 21, 2025 •

edited

Loading

mgoin commented Jan 21, 2025

rasmith commented Jan 22, 2025

mgoin commented Jan 22, 2025

[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD #12282

[AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD #12282

Conversation

rasmith commented Jan 21, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 21, 2025

mgoin left a comment

Choose a reason for hiding this comment

rasmith commented Jan 21, 2025 • edited Loading

mgoin commented Jan 21, 2025

rasmith commented Jan 22, 2025

mgoin commented Jan 22, 2025

rasmith commented Jan 21, 2025 •

edited by github-actions bot

Loading

rasmith commented Jan 21, 2025 •

edited

Loading